Search CORE

39 research outputs found

Distributed top-k aggregation queries at large

Author: A. Marian
Gerhard Weikum
H. David
I.F. Ilyas
K. Church
K. Schnaitter
Matthias Bender
N. Bruno
Peter Triantafillou
R. Akbarinia
R. Fagin
Ralf Schenkel
S. Chaudhuri
S. Madden
Sebastian Michel
T. Cormen
Thomas Neumann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Springer - Publisher Connector

Enlighten

MPG.PuRe

In Good Company: Efficient Retrieval of the Top-k Most Relevant Event-Partner Pairs

Author: F Yu
G Salton
IF Ilyas
IF Ilyas
Ihab F. Ilyas
J Bao
K Schnaitter
N Mamoulis
R Fagin
W Tu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Crossref

VBN

The Repeatability Experiment of SIGMOD 2008

Author: Afanasiev L. (Loredana)
Arion A.
Dittrich J.
Manegold S. (Stefan)
Manolescu I.
Polyzotis N.
Schnaitter K.
Senellart P.
Shasha D.
Zoupanos S.
Publication venue: A.C.M.
Publication date: 01/01/2008
Field of study

SIGMOD 2008 was the first database conference that offered to test submitters' programs against their data to verify the experiments published. This paper discusses the rationale for this effort, the community's reaction, our experiences, and advice for future similar efforts

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

On the Complexity of Query Result Diversification

Author: Abiteboul S.
Adomavicius G.
Agrawal R.
Amer-Yahia S.
Berbeglia G.
Borodin A.
Capannini G.
Chen Z.
Demidova E.
Deng T.
Drosou M.
Durand A.
Fagin R.
Fraternali P.
Gollapudi S.
Hemaspaandra L. A.
Ilyas I. F.
Jin W.
Koutrika G.
Ladner R. E.
Lappas T.
Li C.
Liu Z.
Minack E.
Prokopyev O. A.
Schnaitter K.
Stefanidis K.
Valiant L.
Vardi M. Y.
Vee E.
Vieira M. R.
Xie M.
Yu C.
Yu C.
Zhang M.
Ziegler C.-N.
Publication venue
Publication date: 01/01/2013
Field of study

Query result diversification is a bi-criteria optimization problem for ranking query results. Given a database D, a query Q and a positive integer k, it is to find a set of k tuples from Q(D) such that the tuples are as relevant as possible to the query, and at the same time, as diverse as possible to each other. Subsets of Q(D) are ranked by an objective function defined in terms of relevance and diversity. Query result diversification has found a variety of applications in databases, information retrieval and operations research. This paper studies the complexity of result diversification for relational queries. We identify three problems in connection with query result diversification, to determine whether there exists a set of k tuples that is ranked above a bound with respect to relevance and diversity, to assess the rank of a given k-element set, and to count how many k-element sets are ranked above a given bound. We study these problems for a variety of query languages and for three objective functions. We establish the upper and lower bounds of these problems, all matching, for both combined complexity and data complexity. We also investigate several special settings of these problems, identifying tractable cases. 1

CiteSeerX

Crossref

Edinburgh Research Explorer

Top-k String Auto-Completion with Synonyms

Author: C Xiao
E Hyvönen
F Cai
GB Dantzig
J Lu
JJ Burg
K Schnaitter
PJ Kolesar
R Singh
Y Tsuruoka
Publication venue: Springer International Publishing AG
Publication date: 22/11/2016
Field of study

Auto-completion is one of the most prominent features of modern information systems. The existing solutions of auto-completion provide the suggestions based on the beginning of the currently input character sequence (i.e. prefix). However, in many real applications, one entity often has synonyms or abbreviations. For example, "DBMS" is an abbreviation of "Database Management Systems". In this paper, we study a novel type of auto-completion by using synonyms and abbreviations. We propose three trie-based algorithms to solve the top-k auto-completion with synonyms; each one with different space and time complexity trade-offs. Experiments on large-scale datasets show that it is possible to support effective and efficient synonym-based retrieval of completions of a million strings with thousands of synonyms rules at about a microsecond per-completion, while taking small space overhead (i.e. 160-200 bytes per string).Peer reviewe

arXiv.org e-Print Archive

Crossref

Helsingin yliopiston digitaalinen arkisto